The goal of this exploration is to develop a hierarchical Bayesian model to estimate the true case fatality rate (CFR) for each county. In particular, we will start by taking advantage of the grouping of counties within states. The result of the model will be a “denoised” estimate of the CFR for each county in the country.

The initial motivation for this exploration is to use the distribution of the denoised CFR across counties to estimate to select the shape and scale for a beta prior distribution that will enable the analytic calculation of a denoised posterior CFR for an arbitrary county taking advantage of the conjucacy between a beta prior and a binomial likelihood.

Exploratory plots

Numerical summary of case fatality rates

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.000000 0.004557 0.014925 0.021795 0.029856 0.500000

Case fatality rates by number of

The most extreme CFRs come from counties with small numbers of cases

Median and IQR of county CFR by state

The appears to be meaningful clustering of CFR within states, which suggests that a model with a random effect for state is appropriate.

Modelling

We fit a binomial model to estimate an adjusted (denoised) CFR for each county by shrinking the county CFR towards the state CFR and shrinking the state CFR towards the national CFR. The prior \(N(0,1.6)\) on the intercept is chosen because this prior on the logit scale is approximately uniform over [0,1] when transformed to the probability scale.

I should note that with these priors, this simple model probably does not require STAN to fit (we could use e.g. glmer). However the stan machinery will be needed if we make the model more complex.

We could in theory obtain more precise estimates by placing a more informative prior on the national CFR, however the gains in precision would likely be small given that there is ample data to estimate the national CFR.

Model with intercept and state and county random effect

Priors

##            prior     class      coef group resp dpar nlpar bound
## 1 normal(0, 1.6) Intercept                                      
## 2   normal(0, 1)        sd                                      
## 3                       sd            fips                      
## 4                       sd Intercept  fips                      
## 5                       sd           state                      
## 6                       sd Intercept state

Fit summary

The variance estimates for the state random effect and the county random effect are roughly the same and are relatively large, suggesting that there is meaningful variation in CFR both within and between states.

##  Family: binomial 
##   Links: mu = logit 
## Formula: deaths | trials(cases) ~ (1 | state) + (1 | fips) 
##    Data: dat (Number of observations: 3124) 
## Samples: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
##          total post-warmup samples = 2000
## 
## Group-Level Effects: 
## ~fips (Number of levels: 3124) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.62      0.01     0.60     0.65 1.01      402     1116
## 
## ~state (Number of levels: 51) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.62      0.07     0.51     0.78 1.01      434      859
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -3.96      0.10    -4.14    -3.77 1.02      196      603
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

National CFR

The adjusted national CFR is essentially the same as the unadjusted national CFR, because there is ample data to estimate this CFR.

cases deaths CFR CFR_adj
5651568 175196 0.031 0.031

State CFR

The adjusted state CFRs are very close to the unadjusted state CFRs, because there is ample data to estimate these as well. The regularization on the state random effect may become more important if we do something more complex, such as incorporating a time trend that interacts with state.

County CFR

The adjustment on the county-level CFR is much more meaningful. The many points substantially below the diagonal on this plot indicate counties where the unadjusted CFR is very large but the adjusted (denoised) CFR is much more moderate.

Case fatality rates (adjusted and unadjusted) by number of cases

The relationship between CFR and number of cases looks different after adjustment, in that CFRs for counties with few cases have been shrunken to more moderate values. Interestingly, there is a slight positive relationship (approximately linear on the log scale) between CFR and cases, for both adjusted and unadjusted CFR.

Distribution of CFR by state

Hover over densities to see which state they represent.

Fit beta distribution to overall state-level CFR for use as prior in beta-binomial conjugate adjustment of county CFR at future times

First fit beta distribution to distribution of county-level CFR nationally.

Parameter estimates

shape1 shape2
2.441 105.6

Empirical density versus fitted distribution

Fit a separate distribution per state

state shape1 shape2 n_counties deaths cases mean_CFR mean_CFR_fitted
DE 63.47 1643 3 604 16667 0.03736 0.03718
HI 5.548 521.9 4 48 6746 0.007492 0.01052
RI 3.571 64.26 5 1008 19133 0.05274 0.05264
CT 4.887 57.01 8 4460 51777 0.079 0.07895
NH 7.02 182.9 10 429 7134 0.03083 0.03697
MA 8.152 97.21 14 8944 124612 0.07371 0.07737
VT 12.61 461.1 14 58 1558 0.0186 0.02662
AZ 12.01 357.2 15 4771 198413 0.0327 0.03254
ME 3.211 88.63 16 131 4356 0.03208 0.03497
NV 11.9 632.3 16 1200 66010 0.01393 0.01847
NJ 14.39 148.9 21 15946 189401 0.08823 0.08809
WY 8.54 744.3 23 37 3603 0.01607 0.01134
MD 5.191 140.7 24 3684 104669 0.03457 0.03557
AK 33.69 3846 26 32 4808 0.008152 0.008684
UT 4.284 518.4 28 390 49390 0.007268 0.008196
NM 3.85 154.3 32 747 23153 0.02931 0.02434
OR 7.728 496.4 35 420 25155 0.01262 0.01533
WA 4.674 242.6 39 1863 71125 0.0182 0.0189
ID 3.746 340.2 44 314 30062 0.007887 0.01089
SC 6.771 252.1 46 2511 112551 0.02632 0.02616
ND 19.29 1633 53 137 10000 0.008002 0.01167
MT 9.564 655.6 54 91 6489 0.0111 0.01438
WV 4.714 232.6 55 179 9312 0.01893 0.01986
CA 4.819 336.5 59 12257 675898 0.01288 0.01412
NY 3.348 64.39 62 32468 430145 0.04815 0.04942
CO 5.691 208.8 63 1919 55321 0.02202 0.02653
LA 6.327 191.7 64 4623 143319 0.03158 0.03195
SD 11.23 843.6 66 161 11425 0.009661 0.01314
AL 3.287 160.3 67 2024 116710 0.0206 0.02009
FL 4.528 271.6 67 10397 601978 0.01604 0.0164
PA 4.364 98.98 67 7579 129474 0.0393 0.04223
WI 4.99 405.7 72 1081 70854 0.01328 0.01215
AR 3.342 211.4 75 696 56068 0.01574 0.01557
OK 5.337 338.3 77 730 53511 0.01483 0.01553
MS 6.704 209.9 82 2248 78405 0.03098 0.03096
MI 4.391 113.7 83 6588 102407 0.03328 0.03717
MN 3.319 202.6 87 1771 70171 0.01513 0.01612
NE 4.815 323.1 88 382 31879 0.01214 0.01468
OH 2.918 83.79 88 3986 115651 0.0328 0.03366
IN 3.308 93.43 92 3012 87606 0.03294 0.03419
TN 7.198 641.1 95 1560 138382 0.01062 0.0111
IA 3.801 199.8 99 1040 56583 0.01838 0.01867
NC 5.451 300.5 100 2535 156396 0.01736 0.01782
IL 4.585 208.7 102 7888 221742 0.01793 0.0215
KS 5.97 518.9 105 432 38732 0.012 0.01137
MO 5.126 486.6 115 1426 75914 0.009764 0.01043
KY 4.01 196.2 120 885 43899 0.01818 0.02003
VA 3.673 159.3 133 2471 113630 0.02178 0.02254
GA 3.611 133.1 159 5041 236944 0.02689 0.02641
TX 5.208 202.2 251 11388 588761 0.02832 0.02511

Empirical density versus fitted distribution by state

The number in parentheses after each state indicates the number of counties.

Compare adjusted CFR from model to adjusted CFR computed using empirical Bayes with the state-specific beta prior

Plot adjusted CFR from model versus from EB

Adjusted CFRs from EB are generally lower than those from the model, where the two estimates differ. To understand how similar these two adjusted estimates are relative to the unadjusted CFR, we have to look at the unadjusted CFR as well.

Plot all three CFR for each county (ordered by model adjusted CFR)?

In most cases, the two adjusted CFRs are similar (relative to the unadjusted CFR). The Empirical Bayes adjustment tends to shrink the CFR to slightly lower values than the model adjustment.

One possible explanation for differences between the two adjustment methods is that in the model, there is a single variance for the county random effects across states, while in the EB method, the fitted beta distributions have differing variances by state.

Now plot the adjusted CFR from EB versus the adjusted CFR from the model state

Compare underreporting factors estimated using adjusted and unadjusted CRF

Assuming a true mortality rate of 0.0138. The distribution of estimated underreporting factors is much more heavy-tailed for the unadjusted CFRs compared to the adjusted CFRs.